Data at the level of protein groups (from ‘proteinGroups.txt’ file). A total of 4,313 protein groups were observed in at least one of the six samples. Two treatments, no mineral and mineral, with three replicate samples per treatment.
Abundance values were log2 transformed and all non-observed values were assigned a value of NA.
All potential contaminants and reverse hits were removed. Additionally, any protein groups where 1) the majority protein identifier was comprised of only orthologs or 2) the majority protein identifier has multiple protein groups associated to the organism listed were filtered from the data. Finally, any protein groups with too few observations to conduct a quantitative or qualitative statistical comparison were removed (i.e. at least two observed values per group or at least three observed values in one group). Figure 1 shows the log2 transformed abundance profiles before (left) and after (right) filtering was performed. Filtering did not change the abundance profiles distributions. Table 1 gives the number of protein groups removed at each stage of filtering. The final dataset consisted of 2,808 protein groups.
Table 1: Number of protein groups removed by each filter applied to the data
| Filter | Number Removed |
|---|---|
| Contaminants | 11 |
| Reverse Hits | 61 |
| Orthologs/Double Hits | 475 |
| Observation Filter | 958 |
Figure 1: Log2 abundance profiles for each sample before (left) and after (right) filtering
SPANS (Webb-Robertson et al. 2011) was run on the data to evaluate potential normalization strategies. Based on these results, data was normalized via median centering. Figure 2 shows the normalized log2 transformed abudance profiles for each sample.
Figure 2: Normalized log2 abundance profiles for each samples
A one-way analysis of variance (ANOVA) was run for each protein group to compare mean abundances of samples from the two conditions. Additionally, a G-test (Webb-Robertson et al. 2010) was run to test for differences in presence/absence patterns with a null hypothesis that presence/absence patterns are not related to biological group. Figure 3 shows the number of significant protein groups by direction of expression change for both tests. Figure 4 gives a volcano plot showing the results from the ANOVA analyses.
Figure 3: Number of significant protein groups (p-value \(\leq\) 0.05) by test and direction of change
Figure 4: Volcano plot of ANOVA results. Protein groups with a p-value \(\leq\) 0.05 are colored red
Filtered and normalized data is in the file ‘fusarium_normalized_data.csv’. Statistical results are in the file ‘fusarium_stat_results.csv’. Table 2 gives the names of the columns in the file and a description of the values in each column.
Table 2: Description of columns in statistical results file
| Column | Description |
|---|---|
| Majority.protein.IDs | from original data output |
| Protein.IDs | from original data output |
| Peptide.counts..unique | from original data output |
| NObs_NoMineral | number of samples from No Mineral treatment with observed abundance |
| NObs_Mineral | number of samples from Mineral treatment with observed abundance |
| Mean_NoMineral | mean normalized log2 abundance for No Mineral samples |
| Mean_Mineral | mean normalized log2 abundance for Mineral samples |
| pvalue_Gtest_MvsNoM | g-test p-value |
| pvalue_ANOVA_MvsNoM | ANOVA p-value comparing mean abundances |
| Log2FC_MvsNoM | Log2 fold-change of group means (M/NoM) |
| Flag_0.05_ANOVA_MvsNoM | Flag indicating direction of quantitative change (0: not sig. different, 1: sig. up expressed in Mineral, -1: sig. down expressed in Mineral) |
| Flag_0.05_Gtest_MvsNoM | Flag indicating direction of qualitative change (0: not sig. different, 1: observed more in Mineral, -1: observed less in Mineral) |
Data are visualized in boxplots of log2 abundance against treatment. All plots are collected into a trelliscope display, which allows you to cycle through all protein groups and filter plots by values such as p-values, fold-changes, and protein names. Many values to filter/sort/show protein groups are available in these displays. These metrics are named similar to those in the flat file statistical results.
Sequential projection pursuit principal component analysis (PCA) was run (Webb-Robertson et al. 2013); this method provides the benefit that missing data does not need to be imputed for the algorithm to run. Figure 5 shows the first two principal component scores for each sample with points colored by group.
Figure 5: Scores for the first two principal components, based on normalized protein group abundance profiles, for each sample with points colored by group
Figures 6 - 8 give a glimpse of the protein group from the organism of interest compared to properties of orthologs mapping to the same protein group. All plots are interactive and points can be toggled on and off in the figure by clicking on the legend markers. Figure 6 shows the number of peptides mapping to the organism of interest for a protein group (x-axis) and the number of peptides from the ortholog with the maximum number of peptides mapping to the same protein group (y-axis); points are colored by direction of change based on ANOVA. Figure 7 gives a similar plot but the total number of peptides mapping to ortholog(s) is given on the y-axis. Finally, Figure 8 is also similar, but the total number of ortholog proteins is on the y-axis.
Figure 6: Number of peptides mapping to organism vs the number of peptides associated with the ortholog with the maximum number of peptides. All points are colored by direction of change based on ANOVA results
Figure 7: Number of peptides mapping to organism vs the total number of peptides associated with all ortholog proteins. All points are colored by direction of change based on ANOVA results
Figure 8: Number of peptides mapping to organism vs the total number of ortholog proteins. All points are colored by direction of change based on ANOVA results
Webb-Robertson, Bobbie-Jo M, Melissa M Matzke, Jon M Jacobs, Joel G Pounds, and Katrina M Waters. 2011. “A Statistical Selection Strategy for Normalization Procedures in Lc-Ms Proteomics Experiments Through Dataset-Dependent Ranking of Normalization Scaling Factors.” Proteomics 11 (24): 4736–41.
Webb-Robertson, Bobbie-Jo M, Melissa M Matzke, Thomas O Metz, Jason E McDermott, Hyunjoo Walker, Karin D Rodland, Joel G Pounds, and Katrina M Waters. 2013. “Sequential Projection Pursuit Principal Component Analysis–Dealing with Missing Data Associated with New-Omics Technologies.” Biotechniques 54 (3): 165–68.
Webb-Robertson, Bobbie-Jo M, Lee Ann McCue, Katrina M Waters, Melissa M Matzke, Jon M Jacobs, Thomas O Metz, Susan M Varnum, and Joel G Pounds. 2010. “Combined Statistical Analyses of Peptide Intensities and Peptide Occurrences Improves Identification of Significant Peptides from Ms-Based Proteomics Data.” Journal of Proteome Research 9 (11): 5748–56.